Data center pressure anomaly detection and remediation

ABSTRACT

A system is described that can detect pressure anomalies within a data center, generate an alert when anomalies are detected, and initiate remediative actions. The system monitors each of a plurality of fans used to dissipate heat generated by one or more servers to obtain data that indicates how an actual speed of each of the fans relates to a target speed thereof. The system compares the obtained data to reference data that indicates, for each of the plurality of fans, how an actual speed of the fan relates to a target speed thereof in a substantially pressure-neutral environment. Based on the comparison, the system determines whether or not a pressure anomaly exists. If the system determines that a pressure anomaly exists, then the system may perform various actions such as generating an alert and modifying a manner of operation of one or more of the fans or servers.

BACKGROUND

In data centers that utilize hot aisle containment units forenvironmental control, macroscopic pressure issues occasionally arise.For example, if the fans that operate to cool a group of servers causeair to be blown into a hot aisle containment unit at a rate that exceedsthe rate at which air can be removed therefrom, then the hot aislecontainment unit will become pressured relative to adjacent coldaisle(s). As the pressure increases, the hot air in the hot aislecontainment unit can push out of containment panels and other openingsinto the cold aisle(s) and be drawn into nearby servers. These serverscan exceed their specified thermal values and trip thermal alarms orpossibly become damaged due to excessive heat. Excessive pressure in thehot aisle containment unit can also cause other environmental controlcomponents to be damaged. For example, exhaust fans that are used todraw hot air out of the hot aisle containment unit could potentially bedamaged by the excessive pressure.

SUMMARY

A system is described herein that is operable to automatically detectpressure anomalies within a data center, to generate an alert when suchanomalies are detected, and to initiate actions to remediate theanomalies. In accordance with embodiments, the system monitors each of aplurality of fans used to dissipate heat generated by one or moreservers in the data center. The fans may comprise, for example, serverfans or blade chassis fans that blow air into a hot aisle containmentunit. Through such monitoring, the system obtains data that indicateshow an actual speed of each of the fans relates to a target speed ofeach of the fans. The system then compares the obtained data toreference data that indicates, for each of the plurality of fans, how anactual speed of the fan relates to a target speed of the fan in asubstantially pressure-neutral environment. Based on the comparison, thesystem determines whether or not a pressure anomaly exists in the datacenter. If the system determines that a pressure anomaly exists in thedata center, then the system may generate an alert and/or take steps toremediate the anomaly. Such steps may include, for example, modifying amanner of operation of one or more of the fans and/or modifying a mannerof operation of one or more of the servers.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter. Moreover, itis noted that the claimed subject matter is not limited to the specificembodiments described in the Detailed Description and/or other sectionsof this document. Such embodiments are presented herein for illustrativepurposes only. Additional embodiments will be apparent to personsskilled in the relevant art(s) based on the teachings contained herein.

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

The accompanying drawings, which are incorporated herein and form partof the specification, illustrate embodiments of the present inventionand, together with the description, further serve to explain theprinciples of the invention and to enable a person skilled in therelevant art(s) to make and use the invention.

FIG. 1 is a perspective view of an example hot aisle containment systemthat may be implemented in a data center and that may benefit from thepressure anomaly detection and remediation embodiments described herein.

FIG. 2 is a side view of another example hot aisle containment systemthat may be implemented in a data center and that may benefit from thepressure anomaly detection and remediation embodiments described herein.

FIG. 3 is a block diagram of an example data center management systemthat is capable of automatically detecting pressure anomalies bymonitoring server fans in a data center and taking certain actions inresponse to such detection.

FIG. 4 is a block diagram of an example data center management systemthat is capable of automatically detecting pressure anomalies bymonitoring blade server chassis fans in a data center and taking certainactions in response to such detection.

FIG. 5 depicts a flowchart of a method for generating fan reference datathat indicates, for each of a plurality of fans, how an actual speed ofthe fan relates to a target speed of the fan in a substantiallypressure-neutral environment.

FIG. 6 depicts a flowchart of a method for automatically detecting apressure anomaly within a data center in accordance with an embodiment.

FIG. 7 depicts a flowchart of a method for automatically taking actionsin response to the detection of a pressure anomaly within a data centerin accordance with an embodiment.

FIG. 8 is a block diagram of an example processor-based computer systemthat may be used to implement various embodiments.

The features and advantages of the present invention will become moreapparent from the detailed description set forth below when taken inconjunction with the drawings, in which like reference charactersidentify corresponding elements throughout. In the drawings, likereference numbers generally indicate identical, functionally similar,and/or structurally similar elements. The drawing in which an elementfirst appears is indicated by the leftmost digit(s) in the correspondingreference number.

DETAILED DESCRIPTION I. Introduction

The following detailed description refers to the accompanying drawingsthat illustrate exemplary embodiments of the present invention. However,the scope of the present invention is not limited to these embodiments,but is instead defined by the appended claims. Thus, embodiments beyondthose shown in the accompanying drawings, such as modified versions ofthe illustrated embodiments, may nevertheless be encompassed by thepresent invention.

References in the specification to “one embodiment,” “an embodiment,”“an example embodiment,” or the like, indicate that the embodimentdescribed may include a particular feature, structure, orcharacteristic, but every embodiment may not necessarily include theparticular feature, structure, or characteristic. Moreover, such phrasesare not necessarily referring to the same embodiment. Furthermore, whena particular feature, structure, or characteristic is described inconnection with an embodiment, it is submitted that it is within theknowledge of persons skilled in the relevant art(s) to implement suchfeature, structure, or characteristic in connection with otherembodiments whether or not explicitly described.

A system is described herein that is operable to automatically detectpressure anomalies within a data center, to generate an alert when suchanomalies are detected, and to initiate actions to remediate theanomalies. In accordance with embodiments, the system monitors each of aplurality of fans used to dissipate heat generated by one or moreservers in the data center. The fans may comprise, for example, serverfans or blade chassis fans that blow air into a hot aisle containmentunit. Through such monitoring, the system obtains data that indicateshow an actual speed of each of the fans relates to a target speed ofeach of the fans. The system then compares the obtained data toreference data that indicates, for each of the plurality of fans, how anactual speed of the fan relates to a target speed of the fan in asubstantially pressure-neutral environment. Based on the comparison, thesystem determines whether or not a pressure anomaly exists in the datacenter. If the system determines that a pressure anomaly exists in thedata center, then the system may generate an alert and/or take steps toremediate the anomaly. Such steps may include, for example, modifying amanner of operation of one or more of the fans, and/or modifying amanner of operation of one or more of the servers.

Section II describes example hot aisle containment systems that may beimplemented in a data center and technical problems that may arise whenusing such systems. Section III describes example data center managementsystems that can help solve such technical problems by automaticallydetecting pressure anomalies in a data center, raising alerts about suchanomalies, and taking actions to remediate such anomalies. Section IVdescribes an example processor-based computer system that may be used toimplement various embodiments described herein. Section V describes someadditional exemplary embodiments. Section VI provides some concludingremarks.

II. Example Hot Aisle Containment Systems and Problems AssociatedTherewith

FIG. 1 is a perspective view of an example hot aisle containment system100 that may be implemented in a data center and that may benefit fromthe pressure anomaly detection and remediation embodiments describedherein. Hot aisle containment system 100 may be installed in a datacenter for a variety of reasons including, but not limited to,protecting data center computing equipment, conserving energy, andreducing cooling costs by managing air flow. Hot aisle containmentsystem 100 is merely representative of one type of hot aisle containmentsystem. Persons skilled in the relevant art(s) will appreciate that awide variety of other approaches to implementing a hot aisle containmentsystem may be taken, and that hot aisle containment systems implementedin accordance with such other approaches may also benefit from thepressure anomaly detection and remediation embodiments described herein.

As shown in FIG. 1, example hot aisle containment system 100 includes aplurality of server cabinets 106 ₁-106 ₁₄ disposed on a floor 102 of adata center. Each server cabinet 106 ₁-106 ₁₄ is configured to house aplurality of servers. Each server has at least one cold air intake andat least one hot air outlet. Each server is situated in a server cabinetsuch that the cold air intake(s) thereof are facing or otherwise exposedto one of two cold aisles 112, 114 while the hot air outlet(s) thereofare facing or otherwise exposed to a hot aisle 116. The physicalstructure of server cabinets 106 ₁-106 ₁₄, the servers housed therein,and doors 108, 110 serve to isolate the air in cold aisles 112, 114 fromthe air in hot aisle 116. Still other structures or methods may be usedto provide isolation between cold aisles 112, 114 and hot aisle 116. Forexample, foam or some other material may be inserted between the serversand the interior walls of server cabinets 106 ₁-106 ₁₄ to providefurther isolation between the air in cold aisles 112, 114 and the air inhot aisle 116. Additionally, in scenarios in which gaps exist betweenany of server cabinets 106 ₁-106 ₁₄ or between any of server cabinets106 ₁-106 ₁₄ and the floor/ceiling, panels or other physical barriersmay be installed to prevent air from flowing between hot aisle 116 andcold aisles 112, 114 through such gaps. The enclosure created around thehot aisle by these various structures may be referred to as a “hot aislecontainment unit.”

A cooling system (not shown in FIG. 1) produces cooled air 118 which iscirculated into each of cold aisles 112, 114 via vents 104 in floor 102.A variety of other methods for circulating cooled air 118 into coldaisles 112, 114 may be used. For example, cooled air 118 may becirculated into each of cold aisles 112, 114 via vents in the walls atthe end of a server row or vents in the ceiling. Fans integrated withinthe servers installed in server cabinets 106 ₁-106 ₁₄ operate to drawcooled air 118 into the servers via the cold air intakes thereof. Cooledair 118 absorbs heat from the servers' internal components, therebybecoming heated air 120. Such heated air 120 is expelled by the serverfans into hot aisle 116 via the server hot air outlets.

It is noted that the fans that draw cooled air 118 toward the servercomponents and expel heated air 120 away from the server components neednot be integrated within the servers themselves, but may also beexternally located with respect to the servers. For example, in ascenario in which the servers comprise blade servers installed within ablade server chassis, the blade server chassis may itself include one ormore fans that operate to draw cooled air 118 toward the blade serversand their components via one or more chassis cold air intakes and toexpel heated air 120 away from the blade servers and their componentsvia one or more chassis hot air outlets.

Heated air 120 within hot aisle 116 may be drawn therefrom by one ormore exhaust fans or other airflow control mechanisms (not shown in FIG.1). For example, heated air 120 may be drawn out of hot aisle 116 viavents in a ceiling disposed over hot aisle 116 and routed elsewhereusing a system of ducts. Depending upon the implementation, heated air120 or a portion thereof may be routed back to the cooling system to becooled thereby and recirculated into cold aisles 112, 114. Heated air120 or a portion thereof may also be vented from the data center to theoutside world, or in colder climates, redirected back into the datacenter or an adjacent building or space to provide heating. In any case,direct recirculation of heated air 120 into cold aisles 112, 114 issubstantially prevented. This helps to ensure that the temperature ofthe air that is drawn into the servers is kept at a level that does notexceed the operational specifications thereof, thereby avoiding damageto the servers' internal components. The foregoing features of hot aislecontainment system 100 may also improve the energy efficiency of thedata center and reduce cooling costs.

FIG. 2 is a side view of another example hot aisle containment system200 that may be implemented in a data center and that may benefit fromthe pressure anomaly detection and remediation embodiments describedherein. Like hot aisle containment system 100, hot aisle containmentsystem 200 is also representative of merely one type of hot aislecontainment system.

As shown in FIG. 2, example hot aisle containment system 200 includes aplurality of server cabinets 206 disposed on a floor 202 of a datacenter. Each server cabinet 206 is configured to house a plurality ofservers. Each server has at least one cold air intake and at least onehot air outlet. Each server is situated in a server cabinet 206 suchthat the cold air intake(s) thereof are facing or otherwise exposed toone of two cold aisles 212, 214 while the hot air outlet(s) thereof arefacing or otherwise exposed to a hot aisle 216. The physical structureof server cabinets 206 and the servers housed therein serve to isolatethe air in cold aisles 212, 214 from the air in hot aisle 216. Stillother structures or methods may be used to provide isolation betweencold aisles 212, 214 and hot aisle 216. For example, as further shown inFIG. 2, panels 208 may be installed between the tops of server cabinets206 and ceiling 204 to further isolate the air in cold aisles 212, 214from the air in hot aisle 216.

A computer room air conditioner (CRAC) 210 produces cooled air 218 thatis blown into one or more channels that run under floor 202. Such cooledair 218 passes from these channel(s) into cold aisles 212, 214 via vents222 in floor 202, although other means for venting cooled air 218 intocold aisles 212, 214 may be used. CRAC 210 may represent, for example,an air-cooled CRAC, a glycol-cooled CRAC or a water-cooled CRAC. Stillother types of cooling systems may be used to produce cooled air 218,including but not limited to a computer room air handler (CRAH) andchiller, a pumped refrigerant heat exchanger and chiller, or a direct orindirect evaporative cooling system.

Fans integrated within the servers installed in server cabinets 206operate to draw cooled air 218 into the servers via the cold air intakesthereof. Cooled air 218 absorbs heat from the servers' internalcomponents, thereby becoming heated air 220. Such heated air 220 isexpelled by the server fans into hot aisle 216 via the server hot airoutlets. The fans that draw cooled air 218 toward the server componentsand expel heated air 220 away from the server components need not beintegrated within the servers themselves, but may also be externallylocated with respect to the servers (e.g., such fans may be part of ablade server chassis).

Heated air 220 within hot aisle 216 may be drawn out of hot aisle 216 byone or more exhaust fans or other airflow control mechanism (not shownin FIG. 2). For example, heated air 220 may be drawn out of hot aisle216 via vents 224 in a ceiling 204 disposed over hot aisle 216 androuted via one or more channels back to CRAC 210 to be cooled therebyand recirculated into cold aisles 212, 214. A portion of heated air 220may also be vented from the data center to the outside world, or incolder climates, redirected back into the data center or an adjacentbuilding or space to provide heating. In any case, direct recirculationof heated air 220 into cold aisles 212, 214 is substantially prevented.This helps to ensure that the temperature of the air that is drawn intothe servers is kept at a level that does not exceed the operationalspecifications thereof, thereby avoiding damage to the servers' internalcomponents. The foregoing features of hot aisle containment system 200may also improve the energy efficiency of the data center and reducecooling costs.

To obtain desired airflow, hot aisle containment systems 100, 200 mayeach be configured to maintain a slight negative pressure in the hotaisle. This may be achieved, for example, by utilizing one or moreexhaust fans to draw a slightly greater volume of air per unit of timeout of the hot aisle than is normally supplied to it during the sameunit of time. By maintaining a slightly negative pressure in the hotaisle, air will tend to flow naturally from the cold aisles to the hotaisle. Furthermore, when a slightly negative pressure is maintained inthe hot aisle, the server or blade chassis fans used to cool theservers' internal components will not be required to work as hard toblow or draw air over those components as they would if the pressure inthe hot aisle exceeded that in the cold aisles.

Problems can occur, however, when a large group of server or bladechassis fans begin to operate in unison, thereby causing anextraordinary amount of air to flow into the hot aisle. This might occurfor a variety of reasons. For example, such behavior could be caused byan increase in the ambient temperature in the cold aisles (e.g., if thenormal ambient temperature in the cold aisles is 73° F., and the datacenter raises the ambient temperature to 85° F. due to a heat wave inthe area). As another example, such behavior could be caused by aproblem with the algorithms used to control server or blade chassis fanspeeds. There may be still other causes. Regardless of the cause, suchairflow into the hot aisle may surpass the exhaust capabilities of thehot aisle containment system, thereby generating a higher pressure inthe hot aisle than in the cold aisles. A similar situation may arise ifone or more exhaust fans that operate to draw air from the hot aislestop working or become otherwise incapable of removing air from the hotaisle at the rate air is being blown into the hot aisle.

In these types of situations, as the pressure increases in the hotaisle, the heated air in the hot aisle containment unit can push out ofcontainment panels and other openings into the cold aisle(s) and bedrawn into nearby servers. These servers can exceed their specifiedthermal values and trip thermal alarms or possibly become damaged due toexcessive heat. Excessive pressure in the hot aisle containment unit canalso cause other environmental control components to be damaged. Forexample, exhaust fans that are used to draw hot air out of the hot aislecontainment unit could potentially be damaged by the excessive pressure.In the following section, various data center management systems will bedescribed that can help address such problems by automatically detectingpressure anomalies in a data center that includes a hot aislecontainment system and taking actions to remediate such anomalies beforeequipment damage occurs.

III. Example Data Center Management Systems that Perform AutomatedPressure Anomaly Detection and Remediation

FIG. 3 is a block diagram of an example data center management system300 that is capable of automatically detecting pressure anomalies in adata center and taking certain actions in response thereto. System 300may be implemented, for example and without limitation, to detect andremediate pressure anomalies in a data center that implements a hotaisle containment system such as hot aisle containment system 100discussed above in reference to FIG. 1, hot aisle containment system 200discussed above in reference to FIG. 2, or some other type of hot aislecontainment system.

As shown in FIG. 3, data center management system includes a computingdevice 302 and a plurality of servers 306 ₁-306 _(N), each of which isconnected to computing device 302 via a network 304. Computing device302 is intended to represent a processor-based electronic device that isconfigured to execute software for performing certain data centermanagement operations, some of which will be described herein. Computingdevice 302 may represent, for example, a desktop computer or a server.However, computing device 302 is not so limited, and may also representother types of computing devices, such as a laptop computer, a tabletcomputer, a netbook, a wearable computer (e.g., a head-mountedcomputer), or the like.

Servers 306 ₁-306 _(N) represent server computers located within a datacenter. Generally speaking, each of servers 306 ₁-306 _(N) is configuredto perform operations involving the provision of data to othercomputers. For example, one or more of servers 306 ₁-306 _(N) may beconfigured to provide data to client computers over a wide area network,such as the Internet. Furthermore, one or more of servers 306 ₁-306 _(N)may be configured to provide data to other servers, such as any otherones of servers 306 ₁-306 _(N) or any other servers inside or outside ofthe data center within which servers 306 ₁-306 _(N) reside. Each ofservers 306 ₁-306 _(N) may comprise, for example, a special-purpose typeof server, such as a Web server, a mail server, a file server, or thelike.

Network 304 may comprise a data center local area network (LAN) thatfacilitates communication between each of servers 306 ₁-306 _(N) andcomputing device 302. However, this example is not intended to belimiting, and network 304 may comprise any type of network orcombination of networks suitable for facilitating communication betweencomputing devices. Network(s) 304 may include, for example and withoutlimitation, a wide area network (e.g., the Internet), a personal areanetwork, a private network, a public network, a packet network, acircuit-switched network, a wired network, and/or a wireless network.

As further shown in FIG. 3, server 306 ₁ includes a number ofcomponents. These components include one or more server fans 330, a fancontrol component 332, one or more fan speed sensors 334, and a datacenter management agent 336. It is to be understood that each server 306₂-306 _(N) includes instances of the same or similar components, butthat these have not been shown in FIG. 3 due to space constraints andfor ease of illustration.

Server fan(s) 330 comprise one or more mechanical devices that operateto produce a current of air. For example, each server fan 330 maycomprise a mechanical device that includes a plurality of blades thatare radially attached to a central hub-like component and that canrevolve therewith to produce a current of air. Each server fan 330 maycomprise, for example, a fixed-speed or variable-speed fan. Serverfan(s) 330 are operable to generate airflow for the purpose ofdissipating heat generated by one or more components of server 306 ₁.Server components that may generate heat include but are not limited tocentral processing units (CPUs), chipsets, memory devices, networkadapters, hard drives, power supplies, or the like.

In one embodiment, server 306 ₁ includes one or more cold air intakesand one or more hot air outlets. In further accordance with such anembodiment, each server fan 330 may be operable to draw air into server306 ₁ via the cold air intake(s) and to expel air therefrom via the hotair outlet(s). In still further accordance with such an embodiment, thecold air intake(s) may be facing or otherwise exposed to a data centercold aisle and the hot air outlet(s) may be facing or otherwise exposedto a data center hot aisle. In this embodiment, each server fan 330 isoperable to draw cooled air into server 306 ₁ from the cold aisle andexpel heated air therefrom into the hot aisle.

Fan control component 332 comprises a component that operates to controla speed at which each server fan 330 rotates. The fan speed may berepresented, for example, in revolutions per minute (RPMs), and mayrange from 0 RPM (i.e., server fan is off) to some upper limit. Thedifferent fan speeds that can be achieved by a particular server fanwill vary depending upon the fan type. Fan control component 332 may beimplemented in hardware (e.g., using one or more digital and/or analogcircuits), as software (e.g., software executing on one or moreprocessors of server 306 ₁), or as a combination of hardware andsoftware.

Fan control component 332 may implement an algorithm for controlling thespeed of each server fan 330. For example, fan control component 332 mayimplement an algorithm for selecting a target fan speed for each serverfan 330 based on any number of ascertainable factors. For example, thetarget fan speed may be selected based on a temperature sensed by atemperature sensor internal to, adjacent to, or otherwise associatedwith server 306 ₁, or based on a determined degree of usage of one ormore server components, although these are only a few examples. It isalso possible that fan control component 332 may select a target fanspeed for each server fan 330 based on external input received from adata center management tool or other entity, as will be discussedelsewhere herein.

Although only a single fan control component 332 is shown in FIG. 3, itis possible that server 306 ₁ may include multiple fan controlcomponents. For example, server 306 ₁ may include different fan controlcomponents that operate to control the speed of different server fan(s),respectively.

Fan speed sensor(s) 334 comprise one or more sensors that operate todetermine an actual speed at which each server fan 330 is operating. Theactual speed at which a particular server fan 330 operates may differfrom a target speed at which the server fan is being driven to operateas determined by fan control component 332. For example, although fancontrol component 332 may determine that a particular server fan 330should be driven to operate at a speed of 2,100 RPM, in reality theparticular server fan 330 may be operating at a speed of 2,072 RPM. Thedifference between the target speed and the actual speed may be due to anumber of factors, including the design of the server fan itself, thedesign of the components used to drive the server fan, as well as theambient conditions in which the server fan is operating. For example, ifa higher pressure exists at the hot air outlet(s) of server 306 ₁ thanexists at the cold air inlets thereof, this may cause server fan(s) 330to operate at a reduced actual speed relative to a desired target speed.

Any type of sensor that can be used to determine the speed of a fan maybe used to implement fan speed sensor(s) 334. In one embodiment, fanspeed sensor(s) 334 comprise one or more tachometers, although thisexample is not intended to be limiting.

Data center management agent 336 comprises a software componentexecuting on one or more processors of server 306 ₁ (not shown in FIG.3). Generally speaking, data center management agent 336 performsoperations that enable a remotely-executing data management tool tocollect information about various operational aspects of server 306 ₁and that enable the remotely-executing data management tool to modify amanner of operation of server 306 ₁.

Data center management agent 336 includes a reporting component 340.Reporting component 340 is operable to collect data concerning theoperation of server fan(s) 330 and to send such data to theremotely-executing data center management tool. Such data may include,for example, a target speed of a server fan 330 as determined by fancontrol component 332 (or other component configured to select a targetspeed to which server fan 330 is to be driven) at a particular point intime or within a given timeframe as well as an actual speed of theserver fan 330 as detected by a fan speed sensor 334 at the same pointin time or within the same timeframe.

In an embodiment, reporting component 340 operates to intermittentlycollect a target speed and an actual speed of each server fan 330 and tosend such target speed and actual speed data to the remotely-executingdata center management tool. For example, reporting component 340 mayoperate to obtain such data on a periodic basis and provide it to theremotely-executing data center management tool. The exact times and/orrate at which such data collection and reporting is carried out byreporting component 340 may be fixed or configurable depending upon theimplementation. In an embodiment, the remotely-executing data centermanagement tool can specify when and/or how often such data collectionand reporting should occur. The data collection and reporting may becarried out automatically by reporting component 340 and the data maythen be pushed to the remotely-executing data center management tool.Alternatively, the data collection and reporting may be carried out byreporting component 340 only when the remotely-executing data centermanagement tool requests (i.e., polls) reporting component 340 for thedata.

As will be discussed elsewhere herein, the target and actual speed datafor each server fan 330 that is conveyed by reporting component 340 tothe remotely-executing data center management tool can be used by theremotely-executing data center management tool to determine if apressure anomaly exists in the data center.

Data center management agent 336 also includes a server operationmanagement component 342. Server operation management component 342 isoperable to receive instructions from the remotely-executing data centermanagement tool, and in response to those instructions, change a mannerof operation of server 306 ₁. As will be discussed elsewhere herein, thechange in manner of operation of server 306 ₁ may be intended toremediate or otherwise mitigate a pressure anomaly that has beendetected within the data center in which server 306 ₁ resides. The waysin which server operation management component 342 may change the mannerof operation of server 306 ₁ may include but are not limited to:changing (e.g., reducing) a speed of one or more server fans 330,causing data center management agent 336 to begin monitoring andreporting the temperature of internal server components to theremotely-executing data center management tool (or increasing a rate atwhich such monitoring/reporting occurs), terminating at least oneprocess executing on server 306 ₁ and/or discontinuing the use of atleast one resource of server 306 ₁ (e.g., pursuant to the migration of acustomer workflow to another server), reducing an amount of powersupplied to one or more internal components of server 306 ₁, or shuttingdown server 306 ₁ entirely.

As further shown in FIG. 3, computing device 302 includes a data centermanagement tool 310. Data center management tool 310 comprises asoftware component that is executed by one or more processors ofcomputing device 302 (not shown in FIG. 3). Generally speaking, datacenter management tool 310 is operable to collect operational data fromeach of servers 306 ₁-306 _(N) relating to the target and actual fanspeeds of those servers and to use such operational data to determine ifa pressure anomaly exists within the data center in which those serversare located. Furthermore, data center management tool 310 is operable totake certain actions in response to determining that such a pressureanomaly exists such as generating an alert and/or changing a manner ofoperation of one or more of servers 306 ₁-306 _(N) in a manner intendedto remediate the anomaly.

Data center management tool 310 includes a fan monitoring component 312,a pressure anomaly detection component 318, and a pressure anomalyresponse component 320. In order to perform its operations, data centermanagement tool 310 is operable to access real-time fan data 314 and fanreference data 316. Real-time fan data 314 and fan reference data 316may each be stored in volatile and/or non-volatile memory withincomputing device 302 or may be stored in one or more volatile and/ornon-volatile memory devices that are external to computing device 302and communicatively connected thereto for access thereby. Real-time fandata 314 and fan reference data 316 may each be data that is storedseparately from data center management tool 310 and accessed thereby ormay be data that is internally stored with respect to data centermanagement tool 310 (e.g., within one or more data structures of datacenter management tool 310).

Fan monitoring component 312 is operable to collect information from areporting component installed on each of servers 306 ₁-306 _(N) (e.g.,reporting component 340 installed on server 306 ₁), wherein suchinformation includes operational information about one or more serverfans on each of servers 306 ₁-306 _(N). As was previously described,such information may include target speed and actual speed data for eachmonitored server fan on servers 306 ₁-306 _(N). Fan monitoring component312 stores such operational information as part of real-time fan data314. Such real-time fan data 314 may comprise the raw target and actualfan speed data received from servers 306 ₁-306 _(N), or it may comprisea processed version thereof. For example, fan monitoring component 312may perform certain operations (e.g., filtering, time-averaging,smoothing, error correcting, or the like) on the raw target and actualfan speed data before storing it as real-time fan data 314.

Pressure anomaly detection component 318 is operable to compare theobtained real-time fan data 314 to fan reference data 314 to determinewhether a pressure anomaly exists in the data center in which servers306 ₁-306 _(N) reside. Fan reference data 314 is data that indicates,for each server fan that is monitored by fan monitoring component 312,how an actual speed of the server fan relates to a target speed of theserver fan in a substantially pressure-neutral environment (i.e., in anenvironment in which the pressure at the cold air intake(s) of theserver is at least roughly equivalent to the pressure at the hot airoutlet(s) thereof). By comparing the target-vs-actual speed data that isobtained during operation of the server fans to the referencetarget-vs-actual speed data for the same server fans in asubstantially-pressure neutral environment, pressure anomaly detectioncomponent 318 is able to determine whether or not a pressure anomalyexists in the data center. Specific details concerning how fan referencedata 314 may be obtained and how pressure anomaly detection component318 is able to detect pressure anomalies by comparing real-time fan data314 to fan reference data 316 will be provided below with respect toFIGS. 5 and 6.

Pressure anomaly response component 320 is operable to perform certainactions automatically in response to the detection of a pressure anomalyby pressure anomaly detection component 318. For example, pressureanomaly response component 320 may generate an alert or sendinstructions to one or more of servers 306 ₁-306 _(N) to cause thoseservers to change their manner of operation. Such changes may beintended to remediate the pressure anomaly. Specific details concerningthe automatic responses that may be performed by pressure anomalyresponse component 320 in response to the detection of a pressureanomaly will be provided below with respect to FIG. 7.

Depending upon the implementation, computing device 302 may be locatedin the same data center as servers 306 ₁-306 _(N) or may be locatedremotely with respect to the data center. Furthermore, it is possiblethat various subsets of servers 306 ₁-306 _(N) may be located indifferent data centers. In such a scenario, data management tool 310 maybe capable of detecting pressure anomalies in different data centers andresponding to or remediating the same.

Furthermore, although data center management tool 310 is shown as partof computing device 302, in alternate implementations, data centermanagement tool may be installed and executed on any one or more ofservers 306 ₁-306 _(N). For example, an instance of data centermanagement tool 310 may be installed and executed on one of servers 306₁-306 _(N) and operate to perform pressure anomaly detection andremediation for servers 306 ₁-306 _(N). Alternatively, an instance ofdata center management tool 310 may be installed and executed on oneserver in each of a plurality of subset of servers 306 ₁-306 _(N) andoperate to perform pressure anomaly detection and remediation for theservers in that subset.

FIG. 3 depicts a data center management system 300 in which server fansare monitored and information obtained thereby is used to detectpressure anomalies. However, other types of fans used to dissipate heatgenerated by servers in a data center may be monitored in accordancewith embodiments and information obtained thereby can also be used todetect pressure anomalies. By way of example only, FIG. 4 depicts analternate data center management system 400 in which blade serverchassis fans are monitored and information obtained thereby is used todetect pressure anomalies.

As shown in FIG. 4, data center management system includes a computingdevice 402 that executes a data center management tool 410 and aplurality of blade server chassis 406 ₁-406 _(N), each of which isconnected to computing device 402 via a network 404. Computing device402, data center management tool 410, and network 404 may besubstantially similar to previously-described computing device 302, datacenter management tool 310 and network 304, respectively, except thatdata center management tool 410 is configured to collect blade serverchassis fan operational information as opposed to server fan operationalinformation and to detect pressure anomalies via an analysis thereof.

To this end, data center management tool 410 includes a fan monitoringcomponent 412, a pressure anomaly detection component 418 and a pressureanomaly response component 420 that may operate in a substantiallysimilar manner to fan monitoring component 312, pressure anomalydetection component 318, and pressure anomaly response component 320,respectively, as described above in reference to FIG. 3. Furthermore,data center management tool 410 is operable to access real-time fan data414 and fan reference data 416 which may be substantially similar toreal-time fan data 314 and fan reference data 316 except that such datamay refer to blade server chassis fans as opposed to server fans.

Blade server chassis 406 ₁-406 _(N) represent blade server chassislocated within a data center. Generally speaking, each of blade serverchassis 406 ₁-406 _(N) is configured to house one or more blade servers.As further shown in FIG. 4, blade server chassis 406 ₁ includes a numberof components. These components include one or more blade server chassisfans 430, a fan control component 432, one or more fan speed sensors434, and a data center management agent 436. It is to be understood thateach blade server chassis 406 ₂-406 _(N) includes instances of the sameor similar components, but that these have not been shown in FIG. 4 dueto space constraints and for ease of illustration.

Blade chassis server fan(s) 430 comprise one or more mechanical devicesthat operate to produce a current of air. For example, each blade serverchassis fan 430 may comprise a mechanical device that includes aplurality of blades that are radially attached to a central hub-likecomponent and that can revolve therewith to produce a current of air.Each blade server chassis fan 430 may comprise, for example, afixed-speed or variable-speed fan. Blade server chassis fan(s) 430 areoperable to generate airflow for the purpose of dissipating heatgenerated by one or more blade servers installed within blade serverchassis 406 ₁.

In one embodiment, blade server chassis 406 ₁ includes one or more coldair intakes and one or more hot air outlets. In further accordance withsuch an embodiment, each blade server chassis fan 430 may be operable todraw air into blade server chassis 406 ₁ via the cold air intake(s) andto expel air therefrom via the hot air outlet(s). In still furtheraccordance with such an embodiment, the cold air intake(s) may be facingor otherwise exposed to a data center cold aisle and the hot airoutlet(s) may be facing or otherwise exposed to a data center hot aisle.In this embodiment, each blade server chassis fan 430 is operable todraw cooled air into blade server chassis 406 ₁ from the cold aisle andexpel heated air therefrom into the hot aisle.

Fan control component 432 comprises a component that operates to controla speed at which each blade server chassis fan 430 rotates. The fanspeed may range from 0 RPM (i.e., server fan is off) to some upperlimit. The different fan speeds that can be achieved by a particularblade server chassis fan will vary depending upon the fan type. Fancontrol component 432 may be implemented in hardware (e.g., using one ormore digital and/or analog circuits), as software (e.g., softwareexecuting on one or more processors of blade server chassis 406 ₁), oras a combination of hardware and software.

Fan control component 432 may implement an algorithm for controlling thespeed of each blade server chassis fan 430. For example, fan controlcomponent 432 may implement an algorithm for selecting a target fanspeed for each blade server chassis fan 430 based on any number ofascertainable factors. For example, the target fan speed may be selectedbased on a temperature sensed by a temperature sensor internal to,adjacent to, or otherwise associated with blade server chassis 406 ₁, orbased on a determined degree of usage of one or more blade servercomponents, although these are only a few examples. It is also possiblethat fan control component 432 may select a target fan speed for eachblade server chassis fan 430 based on external input received from adata center management tool or other entity, as will be discussedelsewhere herein.

Although only a single fan control component 432 is shown in FIG. 4, itis possible that blade server chassis 406 ₁ may include multiple fancontrol components. For example, blade server chassis 406 ₁ may includedifferent fan control components that operate to control the speed ofdifferent blade server chassis fan(s), respectively.

Fan speed sensor(s) 434 comprise one or more sensors that operate todetermine an actual speed at which each blade server chassis fan 430 isoperating. Any type of sensor that can be used to determine the speed ofa fan may be used to implement fan speed sensor(s) 434. In oneembodiment, fan speed sensor(s) 434 comprise one or more tachometers,although this example is not intended to be limiting.

Data center management agent 436 comprises a software componentexecuting on one or more processors of blade server chassis 406 ₁ (notshown in FIG. 4). Generally speaking, data center management agent 436performs operations that enable remotely-executing data management tool410 to collect information about various operational aspects of bladeserver chassis 406 ₁ and that enable remotely-executing data managementtool 410 to modify a manner of operation of blade server chassis 406 ₁.

Data center management agent 436 includes a reporting component 440.Reporting component 440 is operable to collect data concerning theoperation of blade server chassis fan(s) 430 and to send such data toremotely-executing data center management tool 410. Such data mayinclude, for example, a target speed of a blade server chassis fan 430as determined by fan control component 432 (or other componentconfigured to select a target speed to which blade server chassis fan430 is to be driven) at a particular point in time or within a giventimeframe as well as an actual speed of the blade server chassis fan 430as detected by a fan speed sensor 434 at the same point in time orwithin the same timeframe. In an embodiment, reporting component 440operates to intermittently collect a target speed and an actual speed ofeach blade server chassis fan 430 and to send such target speed andactual speed data to remotely-executing data center management tool 410.The target and actual speed data for each blade server chassis fan 430that is conveyed by reporting component 440 to remotely-executing datacenter management tool 410 can be used by remotely-executing data centermanagement tool 410 to determine if a pressure anomaly exists in thedata center.

Data center management agent 336 also includes a blade server chassis(BSC) operation management component 442. BSC operation managementcomponent 442 is operable to receive instructions fromremotely-executing data center management tool 410, and in response tothose instructions, change a manner of operation of blade server chassis406 ₁. As will be discussed elsewhere herein, the change in manner ofoperation of blade server chassis 406 ₁ may be intended to remediate orotherwise mitigate a pressure anomaly that has been detected within thedata center in which blade server chassis 406 ₁ resides. The ways inwhich BSC operation management component 442 may change the manner ofoperation of blade server chassis 406 ₁ may include but are not limitedto changing (e.g., reducing) a speed of one or more blade server chassisfans 430, causing data center management agent 436 to begin monitoringand reporting the temperature of blade servers and/or blade servercomponents to remotely-executing data center management tool 410 (orincreasing at rate at which such monitoring/reporting occurs), orshutting down blade server chassis 406 ₁ entirely.

In an embodiment, a data center management agent may also be installedon each blade server installed within blade server chassis 406 ₁. Theseagents may be used by data center management tool 410 to carry outblade-server-specific remediation actions such as but not limited to:terminating at least one process executing on a blade server and/ordiscontinuing the use of at least one resource of a blade server (e.g.,pursuant to the migration of a customer workflow to another server),reducing an amount of power supplied to one or more components of ablade server, or shutting down a blade server entirely.

In a further embodiment of a data center management system, server fansincluded in one or more servers and blade server chassis fans includedin one or more blade chassis are monitored and information obtainedthereby is used to detect pressure anomalies. In further accordance withsuch an embodiment, remediation actions can be taken by changing themanner of operation of one or more servers, server fans, blade serverchassis, blade server chassis fans, or blade servers.

FIG. 5 depicts a flowchart 500 of one example method for generating fanreference data 316, 416 as described above in reference to FIGS. 3 and4, respectively. The method of flowchart 500 is described herein by wayof example only and is not intended to be limiting. Persons skilled inthe relevant art(s) will appreciated that other techniques may be usedto generate fan reference data 316, 416.

As shown in FIG. 5, the method of flowchart 500 begins at step 502 inwhich data is obtained that indicates, for each of a plurality of fans,how an actual speed of the fan relates to a target speed of the fan in asubstantially pressure-neutral environment. Each fan may comprise, forexample, a server fan or a blade server chassis fan. A substantiallypressure-neutral environment may comprise an environment in which thepressure at the fan inlet is roughly or substantially equivalent to thepressure at the fan outlet.

Such data may be obtained, for example, by testing a fan usingtachometer or other suitable sensor while the fan is operating in asubstantially pressure-neutral environment to determine how the actualspeed of the fan compares to the target speed to which the fan is beingdriven. Such data may also be obtained by calibrating the design of afan so that it operates at a particular actual speed when being drivento a particular target speed. Such data may further be obtained fromproduct specifications associated with a particular fan. The dataobtained during step 500 may refer to an individual fan only or to aparticular type of fan (e.g., a particular brand or model of fan).

The data that is obtained during step 500 may indicate, for a particularfan or fan type, how the actual speed of the fan relates to the desiredtarget speed of the fan for multiple different target speeds. Forexample, for a variable speed fan, a range of actual fan speeds may bedetermined that relate to a corresponding range of target speeds. Infurther accordance with this example, a range of actual fan speeds maybe determined that relate to a target speed range of 0 RPM to somemaximum RPM.

At step 504, the data obtained during step 502 in stored in a data storeor data structure that is accessible to a data center management tool,such as either of data center management tool 310 of FIG. 3 or datacenter management tool 410 of FIG. 4. By way of example, the dataobtained during step 502 may be stored in a data store that separatefrom data center management tool 310, 410 and accessed thereby or may bedata that is internally stored with respect to data center managementtool 310, 410 (e.g., within one or more data structures of data centermanagement tool 310, 410).

FIG. 6 depicts a flowchart 600 of a method for automatically detecting apressure anomaly within a data center in accordance with an embodiment.The method of flowchart 600 may be performed, for example, by datacenter management tool 310 of FIG. 3 or data center management tool 410of FIG. 4 and therefore will be described herein with continuedreference to those embodiments. However, the method is not limited tothose embodiments.

As shown in FIG. 6, the method of flowchart 600 begins at step 602,during which each of a plurality of fans used to dissipate heatgenerated by one or more servers in a data center is monitored to obtaindata that indicates how an actual speed of each of the fans relates to atarget speed of each of the fans. The fans may be, for example, serverfans and/or blade server chassis fans. This step may be performed, forexample, by fan monitoring component 312 of data center management tool310 (as described above in reference to FIG. 3) or fan monitoringcomponent 412 of data center management tool 410 (as described above inreference to FIG. 4). As was previously described, these components cancollect such data from data center management agents executing on theservers or blade server chassis that house such servers. As was alsopreviously described, such data may be stored as real-time fan data 314,414.

At step 604, the data obtained during step 602 is compared to referencedata that indicates, for each of the plurality of fans, how an actualspeed of the fan relates to a target speed of the fan in a substantiallypressure-neutral environment. This step may be performed, for example,by pressure anomaly detection component 318 of data center managementtool 310 (as described above in reference to FIG. 3) or pressure anomalydetection component 418 of data center management tool 410 (as describedabove in reference to FIG. 4). This step may comprise, for example,comparing real-time fan data 314 to fan reference data 316 or comparingreal-time fan data 414 to fan reference data 416.

At step 606, based at least on the comparison conducted during step 604,it is determined whether a pressure anomaly exists in the data center.Like step 604, this step may also be performed, for example, by pressureanomaly detection component 318 or pressure anomaly detection component418.

The comparing carried out in step 604 between the data obtained duringstep 602 and the reference data may comprise, for example, determining ameasure of difference or deviation between an actual-to-target speedrelationship specified by the data obtained during step 602 and anactual-to-target speed relationship specified by the reference data. Forexample, if a particular degree of deviation between the obtained dataactual-to-target speed relationship and the reference dataactual-to-target speed relationship is observed or if a particularpattern of deviation is observed over time, then pressure anomalydetection component 318, 418 may determine that a pressure anomalyexists. By way of example, in a scenario in which a positive pressure isbuilding up in a hot aisle containment unit relative to one or moreadjacent cold aisles, one might expect to see that the actual speedachieved by fans blowing heated air into the hot aisle containment unitfor a given target speed will be lower than that obtained for the sametarget speed in a substantially pressure-neutral environment.

In one embodiment, pressure anomaly detection component 318, 418 maydetermine that a pressure anomaly exists if the measure of differencefor a particular number of the fans exceeds a particular threshold. Thisapproach recognizes that a pressure anomaly such as that described above(i.e., a positive pressure is building up in a hot aisle containmentunit relative to one or more adjacent cold aisles) may be likely tosignificantly impact the behavior of a large number of fans. Forexample, if an N % or greater deviation from a referenceactual-to-target speed relationship is observed for M % or greater ofthe monitored fan population, then pressure anomaly detection component318, 418 may determine that a pressure anomaly exists. In addition tothe foregoing, pressure anomaly detection component 318, 418 mayconsider the proximity or location of the fans for which deviations froma reference actual-to-target speed relationship is being reported.

FIG. 7 depicts a flowchart 700 of a method for automatically takingactions in response to the detection of a pressure anomaly within a datacenter in accordance with an embodiment. The method of flowchart 700 maybe performed, for example, by data center management tool 310 of FIG. 3or data center management tool 410 of FIG. 4 and therefore will bedescribed herein with continued reference to those embodiments. However,the method is not limited to those embodiments.

As shown in FIG. 7, the method of flowchart begins at step 702, in whichit is determined that a pressure anomaly exists in the data center. Thisstep is analogous to step 606 of flowchart 600 and thus may be performedin a manner described above with reference to that flowchart. Step 702may be performed, for example, by pressure anomaly detection component318 or pressure anomaly detection component 418.

At step 704, in response to the determination in step 702 that apressure anomaly exists in the data center, one or more actions areselectively performed. This step may be performed, for example, bypressure anomaly response component 320 of data center management tool310 (as described above in reference to FIG. 3) or pressure anomalyresponse component 420 of data center management tool 410 (as describedabove in reference to FIG. 4).

Steps 706, 708 and 710 show various types of actions that may beselectively performed in response to the determination that a pressureanomaly exists. Each of these steps may be carried out in isolation orin conjunction with one or more other steps.

In step 706, an alert is generated. This alert may be audible, visibleand/or haptic in nature. The alert may be generated, for example, via auser interface of computing device 302, computing device 402, or via auser interface of a computing device that is communicatively connectedthereto. The alert may be recorded in a log. The alert may also betransmitted to another device or to a user in the form of a message,e-mail or the like. By generating an alert in this manner, data centerpersonnel can be notified of the pressure anomaly as soon as it isdetected, thereby enabling them to take steps to help remediate theissue.

In step 708, the manner of operation of at least one of the fans used tocool the servers is modified. For example, in an embodiment, pressureanomaly response component 320 may send commands to server operationmanagement components 342 executing on servers 306 ₁-306 _(N) to causethe speed of certain server fans that are determined to be associatedwith the pressure anomaly to be reduced. Likewise, pressure anomalyresponse component 420 may send commands to BSC operation managementcomponents 442 executing on blade server chassis 406 ₁-406 _(N) to causethe speed of certain blade server chassis fans that are determine to beassociated with the pressure anomaly to be reduced. This may have theeffect of reducing the pressure within a hot aisle containment unittoward which the fans are blowing heated air, thereby helping toremediate the pressure anomaly.

In an embodiment, after pressure anomaly response component 310, 410reduces the speed of one or more fans, pressure anomaly responsecomponent 310, 410 may also begin to monitor the temperature of internalserver components via data center management agents 336, 436 (orincrease the rate at which such information is reported) so thatpressure anomaly response component 310, 410 can determine whether thereduction of the fan speed is going to cause those components to exceedspecified thermal limits and potentially be damaged. If pressure anomalyresponse component 310, 410 determines that the reduction of the fanspeed is going to cause those components to exceed specified thermallimits and potentially be damaged, then pressure anomaly responsecomponent 310, 410 may take additional steps, such as increasing fanspeeds or shutting down one or more servers.

In step 710, the manner of operation of at least one of the servers inthe data center is modified. For example, in an embodiment, pressureanomaly response component 310, 410 may interact with data centermanagement agents to shut down one or more of the servers that aredetermined to be impacted by the pressure anomaly. As another example,to ensure that customer service level agreements (SLAs) are satisfied,pressure anomaly response component 310, 410 may operate to migrate oneor more customer workflows from servers that are determined to beimpacted by the pressure anomaly to servers that are not. As yet anotherexample, pressure anomaly response component 310, 410 may interact withdata center management agents to reduce an amount of power supplied toone or more internal components of a server that is determined to beimpacted by the pressure anomaly, which can have the effect of reducingthe temperature of such internal components.

The foregoing are only some examples of the steps that may be taken bypressure anomaly response component 320, 420 to try and remediate adetected pressure anomaly. Since such steps can be carried outautomatically, they can help remediate a pressure anomaly beforeequipment damage is caused and without requiring intervention by datacenter personnel.

IV. Example Computer System Implementation

FIG. 8 depicts an example processor-based computer system 800 that maybe used to implement various embodiments described herein. For example,computer system 800 may be used to implement computing device 302, anyof servers 306 ₁-306 _(N), computing device 402, blade server chassis406 ₁-406 _(N), or any of the blade servers installed therein. Computersystem 800 may also be used to implement any or all of the steps of anyor all of the flowcharts depicted in FIGS. 5-7. The description ofcomputer system 800 is provided herein for purposes of illustration, andis not intended to be limiting. Embodiments may be implemented infurther types of computer systems, as would be known to persons skilledin the relevant art(s).

As shown in FIG. 8, computer system 800 includes a processing unit 802,a system memory 804, and a bus 806 that couples various systemcomponents including system memory 804 to processing unit 802.Processing unit 802 may comprise one or more microprocessors ormicroprocessor cores. Bus 806 represents one or more of any of severaltypes of bus structures, including a memory bus or memory controller, aperipheral bus, an accelerated graphics port, and a processor or localbus using any of a variety of bus architectures. System memory 804includes read only memory (ROM) 808 and random access memory (RAM) 810.A basic input/output system 812 (BIOS) is stored in ROM 808.

Computer system 800 also has one or more of the following drives: a harddisk drive 814 for reading from and writing to a hard disk, a magneticdisk drive 816 for reading from or writing to a removable magnetic disk818, and an optical disk drive 820 for reading from or writing to aremovable optical disk 822 such as a CD ROM, DVD ROM, BLU-RAY™ disk orother optical media. Hard disk drive 814, magnetic disk drive 816, andoptical disk drive 820 are connected to bus 806 by a hard disk driveinterface 824, a magnetic disk drive interface 826, and an optical driveinterface 828, respectively. The drives and their associatedcomputer-readable media provide nonvolatile storage of computer-readableinstructions, data structures, program modules and other data for thecomputer. Although a hard disk, a removable magnetic disk and aremovable optical disk are described, other types of computer-readablememory devices and storage structures can be used to store data, such asflash memory cards, digital video disks, random access memories (RAMs),read only memories (ROM), and the like.

A number of program modules may be stored on the hard disk, magneticdisk, optical disk, ROM, or RAM. These program modules include anoperating system 830, one or more application programs 832, otherprogram modules 834, and program data 836. In accordance with variousembodiments, the program modules may include computer program logic thatis executable by processing unit 802 to perform any or all of thefunctions and features of computing device 302, any of servers 306 ₁-306_(N), computing device 402, blade server chassis 406 ₁-406 _(N), or anyof the blade servers installed therein, as described above. The programmodules may also include computer program logic that, when executed byprocessing unit 802, performs any of the steps or operations shown ordescribed in reference to the flowcharts of FIGS. 5-7.

A user may enter commands and information into computer system 800through input devices such as a keyboard 838 and a pointing device 840.Other input devices (not shown) may include a microphone, joystick, gamecontroller, scanner, or the like. In one embodiment, a touch screen isprovided in conjunction with a display 844 to allow a user to provideuser input via the application of a touch (as by a finger or stylus forexample) to one or more points on the touch screen. These and otherinput devices are often connected to processing unit 802 through aserial port interface 842 that is coupled to bus 806, but may beconnected by other interfaces, such as a parallel port, game port, or auniversal serial bus (USB). Such interfaces may be wired or wirelessinterfaces.

A display 844 is also connected to bus 806 via an interface, such as avideo adapter 846. In addition to display 844, computer system 800 mayinclude other peripheral output devices (not shown) such as speakers andprinters.

Computer system 800 is connected to a network 848 (e.g., a local areanetwork or wide area network such as the Internet) through a networkinterface or adapter 850, a modem 852, or other suitable means forestablishing communications over the network. Modem 852, which may beinternal or external, is connected to bus 806 via serial port interface842.

As used herein, the terms “computer program medium,” “computer-readablemedium,” and “computer-readable storage medium” are used to generallyrefer to memory devices or storage structures such as the hard diskassociated with hard disk drive 814, removable magnetic disk 818,removable optical disk 822, as well as other memory devices or storagestructures such as flash memory cards, digital video disks, randomaccess memories (RAMs), read only memories (ROM), and the like. Suchcomputer-readable storage media are distinguished from andnon-overlapping with communication media (do not include communicationmedia). Communication media typically embodies computer-readableinstructions, data structures, program modules or other data in amodulated data signal such as a carrier wave. The term “modulated datasignal” means a signal that has one or more of its characteristics setor changed in such a manner as to encode information in the signal. Byway of example, and not limitation, communication media includeswireless media such as acoustic, RF, infrared and other wireless media.Embodiments are also directed to such communication media.

As noted above, computer programs and modules (including applicationprograms 832 and other program modules 834) may be stored on the harddisk, magnetic disk, optical disk, ROM, or RAM. Such computer programsmay also be received via network interface 850, serial port interface842, or any other interface type. Such computer programs, when executedor loaded by an application, enable computer system 800 to implementfeatures of embodiments of the present invention discussed herein.Accordingly, such computer programs represent controllers of computersystem 800.

Embodiments are also directed to computer program products comprisingsoftware stored on any computer useable medium. Such software, whenexecuted in one or more data processing devices, causes a dataprocessing device(s) to operate as described herein. Embodiments of thepresent invention employ any computer-useable or computer-readablemedium, known now or in the future. Examples of computer-readablemediums include, but are not limited to memory devices and storagestructures such as RAM, hard drives, floppy disks, CD ROMs, DVD ROMs,zip disks, tapes, magnetic storage devices, optical storage devices,MEMs, nanotechnology-based storage devices, and the like.

In alternative implementations, computer system 800 may be implementedas hardware logic/electrical circuitry or firmware. In accordance withfurther embodiments, one or more of these components may be implementedin a system-on-chip (SoC). The SoC may include an integrated circuitchip that includes one or more of a processor (e.g., a microcontroller,microprocessor, digital signal processor (DSP), etc.), memory, one ormore communication interfaces, and/or further circuits and/or embeddedfirmware to perform its functions.

V. Additional Exemplary Embodiments

A method that is performed by data center management software executingon at least one computer is described herein. In accordance with themethod, each of a plurality of fans used to dissipate heat generated byone or more servers in a data center is monitored to obtain data thatindicates how an actual speed of each of the fans relates to a targetspeed of each of the fans. The obtained data is then compared toreference data that indicates, for each of the plurality of fans, how anactual speed of the fan relates to a target speed of the fan in asubstantially pressure-neutral environment. Based on the comparison, itis determined that a pressure anomaly exists in the data center. Basedon the determination that the pressure anomaly exists in the datacenter, one or more of the following is performed: generating an alertand modifying a manner of operation of one or more of at least one ofthe fans and at least one of the servers.

In an embodiment of the foregoing method, the plurality of fans compriseone or more of a server fan and a blade chassis fan.

In another embodiment of the foregoing method, each of the plurality offans is configured to blow air into a hot aisle containment unit.

In yet another embodiment of the foregoing method, modifying the mannerof operation of at least one of the fans comprises reducing a speed ofat least one of the fans. In further accordance with such an embodiment,the method may further include monitoring a temperature of one or moreinternal components of one or more of the servers responsive to reducingthe speed of the at least one of the fans.

In still another embodiment of the foregoing method, modifying themanner of operation of at least one of the servers comprises migrating acustomer workflow from at least one of the servers.

In a further embodiment of the foregoing method, modifying the manner ofoperation of at least one of the servers comprises shutting down atleast one of the servers.

In a still further embodiment of the foregoing method, modifying themanner of operation of at least one of the servers comprises reducing anamount of power supplied to one or more internal components of one ormore of the servers.

In an additional embodiment of the foregoing method, comparing theobtained data to the reference data comprises determining, for each ofthe fans, a measure of difference between an actual-to-target speedrelationship specified by the obtained data and an actual-to-targetspeed relationship specified by the reference data. In furtheraccordance with such an embodiment, determining that the pressureanomaly exists in the data center based on the comparison may comprisedetermining that the measure of difference for a particular number ofthe fans exceeds a particular threshold.

A system is also described herein. The system includes at least oneprocessor and a memory. The memory stores computer program logic forexecution by the at least one processor. The computer program logicincludes one or more components configured to perform operations whenexecuted by the at least one processor. The one or more componentsinclude a fan monitoring component, a pressure anomaly detectioncomponent and a pressure anomaly response component. The fan monitoringcomponent is operable to monitor each of a plurality of fans used todissipate heat generated by one or more servers in a data center toobtain data that indicates how an actual speed of each of the fansrelates to a target speed of each of the fans. The pressure anomalydetection component is operable to compare the obtained data toreference data that indicates, for each of the plurality of fans, how anactual speed of the fan relates to a target speed of the fan in asubstantially pressure-neutral environment and, based on the comparison,determine that a pressure anomaly exists in the data center. Thepressure anomaly response component is operable to perform one or moreof the following in response to a determination that the pressureanomaly exists: (i) generate an alert and (ii) modify a manner ofoperation of one or more of at least one of the fans and at least one ofthe servers.

In an embodiment of the foregoing system, the pressure anomaly responsecomponent is operable to modify the manner of operation of at least oneof the fans by reducing a speed of at least one of the fans. In furtheraccordance with such an embodiment, the pressure anomaly responsecomponent may be further operable to monitor a temperature of one ormore internal components of one or more of the servers responsive toreducing the speed of the at least one of the fans.

In another embodiment of the foregoing system, the pressure anomalyresponse component is operable to modify the manner of operation of atleast one of the servers by migrating at least one service or resourcefrom at least one of the servers.

In yet another embodiment of the foregoing system, the pressure anomalyresponse component is operable to modify the manner of operation of atleast one of the servers by shutting down at least one of the servers.

In still another embodiment of the foregoing system, the pressureanomaly response component is operable to modify the manner of operationof at least one of the servers by reducing an amount of power suppliedto one or more internal components of one or more of the servers.

In a further embodiment of the foregoing system, the pressure anomalydetection component is operable to compare the obtained data to thereference data by determining, for each of the fans, a measure ofdifference between an actual-to-target speed relationship specified bythe obtained data and an actual-to-target speed relationship specifiedby the reference data. In further accordance with such an embodiment,the pressure anomaly detection component may be operable to determinethat the pressure anomaly exists in the data center based on thecomparison by determining that the measure of difference for aparticular number of the fans exceeds a particular threshold.

A computer program product is also described herein. The computerprogram product comprises a computer-readable memory having computerprogram logic recorded thereon that when executed by at least oneprocessor causes the at least one processor to perform a method thatincludes: monitoring each of a plurality of fans used to dissipate heatgenerated by one or more servers in a data center to obtain data thatindicates how an actual speed of each of the fans relates to a targetspeed of each of the fans; determining that a pressure anomaly exists inthe data center based on at least the obtained data; and based on thedetermination that the pressure anomaly exists in the data center,performing one or more of: generating an alert; and modifying a mannerof operation of one or more of: at least one of the fans; and at leastone of the servers.

In one embodiment of the foregoing computer program product, determiningthat the pressure anomaly exists in the data center based on at leastthe obtained data comprises comparing the obtained data to referencedata that indicates, for each of the plurality of fans, how an actualspeed of the fan relates to a target speed of the fan in a substantiallypressure-neutral environment.

VI. Conclusion

While various embodiments have been described above, it should beunderstood that they have been presented by way of example only, and notlimitation. It will be apparent to persons skilled in the relevantart(s) that various changes in form and details can be made thereinwithout departing from the spirit and scope of the invention. Thus, thebreadth and scope of the present invention should not be limited by anyof the above-described exemplary embodiments, but should be defined onlyin accordance with the following claims and their equivalents.

What is claimed is:
 1. A method performed by data center managementsoftware executing on at least one computer, comprising: monitoring eachof a plurality of fans used to dissipate heat generated by one or moreservers in a data center to obtain data that indicates how an actualrotational speed of each of the fans as determined by a fan speed sensorrelates to a target rotational speed of each of the fans; comparing theobtained data to reference data that indicates, for each of theplurality of fans, how an actual rotational speed of the fan relates toa target rotational speed of the fan in a substantially pressure-neutralenvironment; based on the comparison, determining that a pressureanomaly exists in the data center; and based on the determination thatthe pressure anomaly exists in the data center, modifying a manner ofoperation of one or more of: at least one of the fans; and at least oneof the servers.
 2. The method of claim 1, wherein the plurality of fanscomprise one or more of: a server fan; and a blade chassis fan.
 3. Themethod of claim 1, wherein each of the plurality of fans is configuredto blow air into a hot aisle containment unit.
 4. The method of claim 1,wherein modifying the manner of operation of at least one of the fanscomprises: reducing a speed of at least one of the fans.
 5. The methodof claim 4, further comprising: monitoring a temperature of one or moreinternal components of one or more of the servers responsive to reducingthe speed of the at least one of the fans.
 6. The method of claim 1,wherein modifying the manner of operation of at least one of the serverscomprises: migrating a customer workflow from at least one of theservers.
 7. The method of claim 1, wherein modifying the manner ofoperation of at least one of the servers comprises: shutting down atleast one of the servers.
 8. The method of claim 1, wherein modifyingthe manner of operation of at least one of the servers comprises:reducing an amount of power supplied to one or more internal componentsof one or more of the servers.
 9. The method of claim 1, whereincomparing the obtained data to the reference data comprises determining,for each of the fans, a measure of difference between anactual-to-target speed relationship specified by the obtained data andan actual-to-target speed relationship specified by the reference data.10. The method of claim 9, wherein determining that the pressure anomalyexists in the data center based on the comparison comprises: determiningthat the measure of difference for a particular number of the fansexceeds a particular threshold.
 11. A system comprising: at least oneprocessor; and a memory that stores computer program logic for executionby the at least one processor, the computer program logic including oneor more components configured to perform operations when executed by theat least one processor, the one or more components including: a fanmonitoring component that is operable to monitor each of a plurality offans used to dissipate heat generated by one or more servers in a datacenter to obtain data that indicates how an actual rotational speed ofeach of the fans as determined by a fan speed sensor relates to a targetrotational speed of each of the fans; a pressure anomaly detectioncomponent that is operable to compare the obtained data to referencedata that indicates, for each of the plurality of fans, how an actualrotational speed of the fan relates to a target rotational speed of thefan in a substantially pressure-neutral environment and, based on thecomparison, determine that a pressure anomaly exists in the data center;and a pressure anomaly response component that is operable to, inresponse to a determination that the pressure anomaly exists, modify amanner of operation of one or more of at least one of the fans and atleast one of the servers.
 12. The system of claim 11, wherein thepressure anomaly response component is operable to modify the manner ofoperation of at least one of the fans by reducing a speed of at leastone of the fans.
 13. The system of claim 12, wherein the pressureanomaly response component is further operable to monitor a temperatureof one or more internal components of one or more of the serversresponsive to reducing the speed of the at least one of the fans. 14.The system of claim 11, wherein the pressure anomaly response componentis operable to modify the manner of operation of at least one of theservers by migrating a customer workflow from at least one of theservers.
 15. The system of claim 11, wherein the pressure anomalyresponse component is operable to modify the manner of operation of atleast one of the servers by shutting down at least one of the servers.16. The system of claim 11, wherein the pressure anomaly responsecomponent is operable to modify the manner of operation of at least oneof the servers by reducing an amount of power supplied to one or moreinternal components of one or more of the servers.
 17. The system ofclaim 11, wherein the pressure anomaly detection component is operableto compare the obtained data to the reference data by determining, foreach of the fans, a measure of difference between an actual-to-targetspeed relationship specified by the obtained data and anactual-to-target speed relationship specified by the reference data. 18.The system of claim 17, wherein the pressure anomaly detection componentis operable to determine that the pressure anomaly exists in the datacenter based on the comparison by determining that the measure ofdifference for a particular number of the fans exceeds a particularthreshold.
 19. A computer program product comprising a computer-readablememory having computer program logic recorded thereon that when executedby at least one processor causes the at least one processor to perform amethod comprising: monitoring each of a plurality of fans used todissipate heat generated by one or more servers in a data center toobtain data that indicates how an actual rotational speed of each of thefans as determined by a fan speed sensor relates to a target rotationalspeed of each of the fans; comparing the obtained data to reference datathat indicates, for each of the plurality of fans, how an actualrotational speed of the fan relates to a target rotational speed of thefan in a substantially pressure-neutral environment; determining that apressure anomaly exists in the data center based on at least thecomparison; and based on the determination that the pressure anomalyexists in the data center, modifying a manner of operation of one ormore of: at least one of the fans; and at least one of the servers. 20.The computer program product of claim 19, wherein comparing the obtaineddata to the reference data comprises determining, for each of the fans,a measure of difference between an actual-to-target speed relationshipspecified by the obtained data and an actual-to-target speedrelationship specified by the reference data.